Skip to content

Add support for reading PDBs from AF2 database#102

Merged
rasbt merged 15 commits intoBioPandas:mainfrom
a-r-j:af2
May 12, 2022
Merged

Add support for reading PDBs from AF2 database#102
rasbt merged 15 commits intoBioPandas:mainfrom
a-r-j:af2

Conversation

@a-r-j
Copy link
Contributor

@a-r-j a-r-j commented Apr 30, 2022

Description

Simple PR. Adds an equivalent method (fetch_af2) to fetch_pdb to interact with structures in the AF DB via their UniProt ID

Related issues or pull requests

None

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./biopandas

@pep8speaks
Copy link

pep8speaks commented Apr 30, 2022

Hello @a-r-j! Thanks for updating this PR.

Line 70:89: E501 line too long (113 > 88 characters)
Line 71:89: E501 line too long (123 > 88 characters)
Line 77:89: E501 line too long (104 > 88 characters)
Line 80:89: E501 line too long (116 > 88 characters)
Line 83:89: E501 line too long (125 > 88 characters)
Line 94:89: E501 line too long (107 > 88 characters)
Line 99:39: E203 whitespace before ':'
Line 100:89: E501 line too long (97 > 88 characters)
Line 101:41: E203 whitespace before ':'
Line 102:89: E501 line too long (101 > 88 characters)
Line 113:89: E501 line too long (118 > 88 characters)
Line 154:89: E501 line too long (102 > 88 characters)

Line 10:1: F401 'urllib.request.urlopen' imported but unused
Line 26:89: E501 line too long (99 > 88 characters)
Line 132:9: F841 local variable 'url' is assigned to but never used
Line 137:89: E501 line too long (95 > 88 characters)

Line 118:5: E303 too many blank lines (2)
Line 118:89: E501 line too long (111 > 88 characters)
Line 119:89: E501 line too long (121 > 88 characters)
Line 125:89: E501 line too long (104 > 88 characters)
Line 128:89: E501 line too long (116 > 88 characters)
Line 131:89: E501 line too long (134 > 88 characters)
Line 142:89: E501 line too long (106 > 88 characters)
Line 145:9: E303 too many blank lines (2)
Line 148:39: E203 whitespace before ':'
Line 149:89: E501 line too long (97 > 88 characters)
Line 150:41: E203 whitespace before ':'
Line 151:89: E501 line too long (101 > 88 characters)
Line 162:89: E501 line too long (121 > 88 characters)
Line 346:89: E501 line too long (102 > 88 characters)

Line 9:1: F401 'urllib.error.URLError' imported but unused
Line 10:1: F401 'urllib.request.urlopen' imported but unused
Line 137:9: F841 local variable 'url' is assigned to but never used
Line 143:89: E501 line too long (93 > 88 characters)

Comment last updated at 2022-05-12 01:49:06 UTC

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@a-r-j
Copy link
Contributor Author

a-r-j commented Apr 30, 2022

@rasbt Will add a note to the changelog but unsure how you want to proceed wrt naming convention.

Copy link
Member

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool stuff! Thanks for this!

Btw. in addition to the other suggestions, one thing I couldn't annotate in the review because the files were empty, so I am commenting about this here: do we need the __init__.py files in the test folders? Just curious what they do? (I mean there is maybe an advantage, but could you maybe summarize it?)

@a-r-j
Copy link
Contributor Author

a-r-j commented Apr 30, 2022

I added the __init__s because tests would fail locally for me due to the overlap in the filenames for the mmcif and pdb testsuites. Adding the inits differentiates them in the namespace.

@rasbt
Copy link
Member

rasbt commented Apr 30, 2022

Makes sense, thanks!

Copy link
Member

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


class PandasMmcif:
def __init__(self, use_auth: bool = True):
def __init__(self, use_auth: bool = True, af2_version: int = 2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be for specifying which alphafold database to use? I don't think we need this as an __init__ attribute though? Can't it be specified in fetch_mmcif(source="")?

E.g., something like

if source == 'alphafold2-v1':
    af2_version = 1
elif source == 'alphafold2-v2':
    af2_version = 2
else:
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call I think it belongs somewhere else. The param specifies which release of the database to retrieve the structure from. I don't anticipate it being changed much - the latest version is what I expect 99% of people would use - it was intended to reduce the maintenance burden in between new releases and updating the default.

return self

def fetch_mmcif(self, pdb_code: Optional[str] = None, uniprot_id: Optional[str] = None, source: str = "pdb"):
def fetch_mmcif(self, pdb_code: Optional[str] = None, uniprot_id: Optional[str] = None, source: str = "pdb", af2_version: int = 2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work, but i wonder if it wouldn't make more sense to make it part of the string, otherwise, we end up with a very specific argument that only makes sense if users select "alphafold2"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting it in the string is still problematic as it would require an update to biopandas to support each new release of the database before it can be used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I feel like this kind of complicated the user-interface too much. This is a value very specific to one particular "source" choice. Sure, we only have 2 source choices at the moment, but there could be more in the future (e.g., competitors to AF2, potentially AF3 etc.) and then it would become very messy for users to figure out which options go together.

I feel like it would be much simpler to support a fixed number of options for the sake of user-friendliness. If there is a v3 in the future, I don't think it would be much effort to update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've made the changes :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@rasbt
Copy link
Member

rasbt commented May 12, 2022

Looks good to me know! The documentation currently doesn't fully execute because the part on multiple models is not here yet, but I can take care of that when it is merged into main. So the only thing left is the Changelog note, unless you have anything else you would like to add?
Great PR btw! Thanks a lot!

@a-r-j
Copy link
Contributor Author

a-r-j commented May 12, 2022

Nothing further from me, but I might pick up some loose threads from the open issues over the summer 😁 Thanks for all the help on the PRs - it's super fun working on this with you and I'm learning lots!

@rasbt
Copy link
Member

rasbt commented May 12, 2022

Awesome! 🎉🙌.
I will actually make a new release tonight then so people can start using it 🤗

@rasbt rasbt merged commit dfac9fc into BioPandas:main May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants